Towards automatic cross-lingual transfer of semantic annotation

نویسنده

  • Diana Trandabat
چکیده

In order to develop a semantic labeling system, the most common methods use supervised learning from an annotated corpus. What if we have short deadlines and limited human and financial possibilities that prevent us from building such a training corpus for our language? If such a corpus already exists for any other language, this paper proposes a method to automatically import the existing corpus for the language we need. The transfer method is based on translating the existing corpus (or using annotated versions of existing parallel texts), aligning it at word level, and applying a set of mapping functions to import the annotation from one language to another. An import validation interface is also offered for the manual validation of the resulted resource. As an example, the case of semantic role import from the English FrameNet to Romanian is discussed. RÉSUMÉ. Afin de développer un système d'étiquetage sémantique automatique, les méthodes les plus fréquentes utilisent l'apprentissage supervisé à partir d'un corpus annoté. Et si on a des délais courts et des possibilités humaines et financières limitées, qui nous empêchent de construire un tel corpus d'apprentissage pour la langue de notre choix? Si un tel corpus existe déjà pour une autre langue, cet article propose une méthode pour importer automatiquement le corpus existant dans la langue où nous le nécessitons. La méthode de transfert présentée dans cet article est basée sur la traduction du corpus existant (ou l’utilisation d’une version parallèle annotée du texte), l'alignement au niveau du mot des deux versions de texte, et l’application d’un set de fonctions de mappage pour importer l'annotation d'une langue à l'autre. Une interface de validation de l'import est également offerte pour la validation manuelle de la ressource obtenue. A titre d'exemple, le cas de l'import des rôles sémantiques de la ressource anglaise FrameNet vers le roumain est discuté.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scaling up Automatic Cross-Lingual Semantic Role Annotation

Broad-coverage semantic annotations for training statistical learners are only available for a handful of languages. Previous approaches to cross-lingual transfer of semantic annotations have addressed this problem with encouraging results on a small scale. In this paper, we scale up previous efforts by using an automatic approach to semantic annotation that does not rely on a semantic ontology...

متن کامل

RECSA: Resource for Evaluating Cross-lingual Semantic Annotation

In recent years large repositories of structured knowledge (DBpedia, Freebase, YAGO) have become a valuable resource for language technologies, especially for the automatic aggregation of knowledge from textual data. One essential component of language technologies, which leverage such knowledge bases, is the linking of words or phrases in specific text documents with elements from the knowledg...

متن کامل

Translational Equivalence and Cross-lingual Parallelism: The Case of FrameNet Frames

Annotation projection is a strategy for the cross-lingual transfer of annotations which can be used to bootstrap linguistic resources for low-density languages, such as role-semantic databases similar to FrameNet. In this paper, we investigate the main assumption underlying annotation projection, cross-lingual parallelism, which states that annotation is parallel across languages. Concentrating...

متن کامل

Cross-Lingual Validity of PropBank in the Manual Annotation of French

Methods that re-use existing mono-lingual semantic annotation resources to annotate a new language rely on the hypothesis that the semantic annotation scheme used is cross-lingually valid. We test this hypothesis in an annotation agreement study. We show that the annotation scheme can be applied cross-lingually.

متن کامل

Exploiting Knowledge Bases for Multilingual and Cross-lingual Semantic Annotation and Search

The amount of entities in large knowledge bases (KBs) has been increasing rapidly, making it possible to propose new ways of intelligent information access. In addition, there is an impending need for systems that can enable multilingual and cross-lingual information access. In this work, we firstly demonstrate X-LiSA, an infrastructure for multilingual and cross-lingual semantic annotation, wh...

متن کامل

Linguistic Annotation for the Semantic Web

Establishing the semantic web on a large scale implies the widespread annotation of web documents with ontology-based knowledge markup. For this purpose, tools have been developed that allow for semi-automatic annotation of web documents with ontology-based metadata. However, given that a large number of web documents consist either fully or at least partially of free text, language technology ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011